Skip to content

[Cherry-Pick][Others] support arm env#8084

Open
BingooYang wants to merge 3 commits into
PaddlePaddle:release/2.6from
BingooYang:support_gb
Open

[Cherry-Pick][Others] support arm env#8084
BingooYang wants to merge 3 commits into
PaddlePaddle:release/2.6from
BingooYang:support_gb

Conversation

@BingooYang

Copy link
Copy Markdown
Contributor

Motivation

支持arm环境

Modifications

替换没有arm版本包的软件

  1. 使用paddlecodec替换decord,并使用paddlecodec进行适配
  2. 使用fast_dataindex替换tool_helpers
  3. 添加测试

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter

codecov-commenter commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.06173% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@82c7c7a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...y/input/ernie4_5_vl_processor/utils/video_utils.py 94.73% 2 Missing ⚠️
fastdeploy/input/qwen3_vl_processor/process.py 50.00% 1 Missing ⚠️
fastdeploy/input/qwen_vl_processor/process.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #8084   +/-   ##
==============================================
  Coverage               ?   71.70%           
==============================================
  Files                  ?      386           
  Lines                  ?    55835           
  Branches               ?     8768           
==============================================
  Hits                   ?    40038           
  Misses                 ?    12983           
  Partials               ?     2814           
Flag Coverage Δ
GPU 71.70% <95.06%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-30 11:24:15

📋 Review 摘要

PR 概述:将 VL 视频读取链路从 decord 切换到 paddlecodec,并替换 ARM 环境不可用的依赖。
变更范围fastdeploy/input/requirements*.txttests/input/
影响面 Tag[DataProcessor]

问题

级别 文件 概述
🔴 Bug fastdeploy/input/video_utils.py:127 默认固定的 paddlecodec==0.1.0 不支持 VideoDecoder(seek_mode=...),共享视频读取初始化会失败
🔴 Bug fastdeploy/input/ernie4_5_vl_processor/utils/video_utils.py:108 Ernie 专用 wrapper 同样传入不兼容的 seek_mode 参数
🔴 Bug fastdeploy/input/video_utils.py:164 非 GIF 视频 bytes 被转换成 BytesIO 后传给 paddlecodec 0.1,字节流输入会失败
🔴 Bug fastdeploy/input/ernie4_5_vl_processor/process_video.py:30 Ernie 视频读取链路也保留了不兼容的 bytes -> BytesIO 归一化
🟡 建议 tests/input/test_ernie_video_utils.py:243 新增测试文件包含 checklist §C 标记的直接执行入口

历史 Findings 修复情况

Finding 问题 状态
F1 共享视频工具删除 read_video_decord 导出导致兼容性破坏 ⚠️ 仍存在
F2 Ernie VL 包级导出删除 read_video_decord 名称 ⚠️ 仍存在
F3 共享 wrapper 设置 sys.modules["torchcodec"] = None 污染进程模块缓存 ⚠️ 仍存在
F4 Ernie 专用 wrapper 同样污染 sys.modules["torchcodec"] ⚠️ 仍存在

📝 PR 规范检查

不完全符合。目标分支是 release/2.6,标题 [Cherry-Pick][Others] support arm env 未按 checklist §D1 补充 (#原PR号),且实际影响面更贴近 [DataProcessor];PR 描述中的 Usage or CommandAccuracy Tests 章节为空。当前上下文未提供 develop 原 PR 号,因此不生成可直接复制的完整标题,避免误填。

PR 描述建议(点击展开,可直接复制)
## Motivation
支持 ARM 环境,替换没有 ARM 版本包的软件,并将视频读取链路从 decord 适配到 paddlecodec。

## Modifications
1. 将 Ernie/Qwen/Qwen3/PaddleOCR VL 处理器的视频读取入口切换为 paddlecodec wrapper。
2. 使用 paddlecodec/torchcodec VideoDecoder 适配原有 `asnumpy()` 帧读取接口,并保留 GIF 转 MP4 支持。
3. 添加和更新 `tests/input/` 下的视频工具、抽帧逻辑和 VL processor 单元测试。

## Usage or Command
N/A

## Accuracy Tests
N/A(本 PR 仅替换视频读取依赖和数据处理适配,不涉及模型计算精度变更。)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

当前替换方向符合 ARM 适配目标,但 paddlecodec wrapper 与默认依赖版本的 API 契约没有对齐,且 bytes 视频输入在两条读取链路都会回归。建议先修复这些运行时问题,再继续推进依赖替换。

PADDLECODEC_NUM_THREADS = int(os.environ.get("PADDLECODEC_NUM_THREADS", 0))
self._decoder = VideoDecoder(
video_path,
seek_mode="exact",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug requirements.txt 当前固定 paddlecodec==0.1.0,但这个版本通过 torchcodec.decoders.VideoDecoder 暴露的构造参数不包含 seek_mode,这里会在所有共享视频读取路径初始化时直接 TypeError

建议修复方式:要么去掉 seek_mode="exact" 并按 0.1.0 支持的参数构造 VideoDecoder,要么把 requirements*.txt 统一提升/固定到支持该参数的 paddlecodec 版本,并补一个真实构造 smoke test,避免 mock 吞掉非法参数。

PADDLECODEC_NUM_THREADS = int(os.environ.get("PADDLECODEC_NUM_THREADS", 0))
self._decoder = VideoDecoder(
video_path,
seek_mode="exact",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug Ernie 专用 wrapper 也传入了 seek_mode,而默认依赖固定到 paddlecodec==0.1.0 时该参数不可用;read_video_paddlecodec() 会在创建 VideoDecoder 时直接失败。

建议修复方式:和共享 fastdeploy.input.video_utils 保持一致,删除该参数,或同步提升并固定所有 requirements 中的 paddlecodec 版本到支持 seek_mode 的版本,同时让单测使用会拒绝未知 kwargs 的 fake decoder。



def read_video_decord(video_path, save_to_disk: bool = False):
def read_video_paddlecodec(video_path, save_to_disk: bool = False):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 这个函数保留了旧的 bytes -> io.BytesIO 归一化,但新后端不再是 decord;默认固定的 paddlecodec 0.1 VideoDecoder 支持原始 bytes,不支持把非 GIF 视频作为 BytesIO 传入。所以上层传入视频字节流时会在 wrapper 内构造 decoder 失败。

建议修复方式:不要在这里把普通 bytes 转成 BytesIO;直接把原始 bytes 交给 VideoReaderWrapper,并在 wrapper 内只对 GIF bytes/BytesIO 做临时文件转码。非 GIF BytesIO 也应先读回原始 bytes 或落临时文件后再调用 VideoDecoder


def read_video_decord(video_path, save_to_disk):
"""get reader and meta by decord"""
def read_video_paddlecodec(video_path, save_to_disk):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug Ernie 这条读取链路同样沿用了 bytes -> io.BytesIO 的旧 decord 适配方式;切到 paddlecodec 后,非 GIF 视频字节会作为 BytesIO 传进 VideoDecoder,默认 0.1 后端无法识别。

建议修复方式:这里直接保留原始 bytes 传给 VideoReaderWrapper,并把 GIF/非 GIF 的 bytes 处理统一放到 wrapper 中;修复后补一个非 GIF bytes 输入的测试,fake decoder 需要校验收到的是 bytes 或真实临时路径,而不是 BytesIO

self.assertFalse(os.path.exists(tmp_path))


if __name__ == "__main__":

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 checklist §C 明确把 test/** 下新增 if __name__ == "__main__" 作为环境依赖入口信号;仓库测试应通过 pytest/unittest discovery 统一执行,保留直接入口容易让本地直跑和 CI 收集路径分叉。

建议修复方式:删除这个 main guard,依赖现有测试发现机制运行该文件。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants